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Abstract. Given are tlie h facets of an abstract simplicial complex SC based on 
[w] = {1,2, ••• ,w}, and a fixed k S [w]. Then the fk many sets X € SC with 
\X\ = k can be enumerated in time 0{fkh'^'2^w^). When h and w are viewed as 
fixed parameters this is output-linear time. Applications are given to combinatorial 
commutative algebra and to frequent set mining. 



1 Introduction 



A simplicial complex (also called set ideal) based on a set W of cardinality to is a family SC of 
subsets X (called faces) such that from X G SC, Y <Z X, follows Y € SC. In this article all 
structures are assumed to be finite. In particular, all simplicial complexes SC contain maximal 
faces, called the facets Fi of SC {1 < i < h). The facets uniquely determine SC. 

This article is about partitioning SC into R pieces rj C SC. Usually R is small compared to \SC\ 
and by the use of wildcards each packs its many faces in a compact way. More specifically rj 
is a length w multivalued row with components 0, 1, 2, e. In Sections 3 to 6 (previewed in more 
detail below) we use this gadget to assess the complexity of counting, packing or enumerating 
faces, in particular faces of fixed cardinality. Section 7 and 8 give applications of it all to 
combinatorial commutative algebra and frequent set mining respectively. 



In Section 2 the facets are used, injunction with binary decision diagrams, to calculate = \SC\. 
In Section 3 multivalued rows enter the picture in order to refine the calculation of N to the 
calculation of all numbers of /j-element faces {1 < k < w). This entails applying a certain 
e-algorithm to h constraints coupled to the h facets. In Section 4 we invest ^(/i^ — h) constraints 
but get out the whole of SC as a disjoint union of few multivalued rows (which yields all fkS as a 
side product). Section 5 shows how to get SC partitioned in case not the facets but the minimal 
non- faces of SC are provided. In Section 6 we address the complexity of enumerating faces one 
by one in ordinary set notation. Enumerating all faces the "naive way" is folklore and costs 
0{N'^hw), and ditto the enumeration of all /c-faces which costs 0{f^hw). Yet based on Section 
4 the latter task can be achieved in output-linear time 0{fkh'^2^w^) albeit the reader may raise 
an eyebrow on the factor 2^. Fortunately, the philosophy of fixed parameter tractability dusts 
off such concerns. 
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2 Calculating the cardinality of a simplicial complex from its 
facets 



Given the facets of an arbitrary simplicial complex SC, what is the fastest way to compute its 
cardinality? The question is of considersable interest, e.g. in Frequent Set Mining, but the easy 
answer below seems to have escaped some authors. 

Say W = [14] , and SC is defined by its facets 

Fi = {1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 14}, F2 = {1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14} 
F3 = {1, 2, 3, 4, 5, 8, 9, 10, 13, 14}, F4 = {1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12} 
F5 = {9, 10, 11, 12, 13, 14}, Fq = {1, 2, 6, 7, 9, 10, 11, 14}. 

Consider the Boolean function b : {0, 1}^ {0, 1} whose ith conjunction consists of literals 
with indices not in the ith facet: 

b{x) = A X4 A Xg) V A Xio) V • • • V (x3 A X4 A X5 A A X12 A X13). 

If follows that b{x) = 1 if and only if the support X := {i\xi = 1} belongs to SC. Therefore 

\SC'\ = Satisf iabilityCount[6] = 7600, 

where Satisf iabilityCount is a Mathematica-command that is based on the technique of 
binary decision diagrams (BDD), see [K]. Generally, if a counting problem that doesn't imme- 
diately yield to combinatorial evaluation can be recast as counting the number of models of a 
Boolean function, then BDD's are likely the fastest option. 

3 Getting the face numbers of a simplicial complex from its 
facets 

How to compute the face numbers fk of a simplicial complex that is given by its facets? Because 
incorporating a cardinality constraint into a Boolean function blows it out of proportion, BDD's 
are ruled out. However, one can proceed as follows. Returning to our example, consider the 
complements Hi := W \ Fi of the facets (1 < ^ < 6) and observe that for all X G P(W^) (the 
power set) one has: 

x^sC ^ {yi){x%Fi) ^ (Vz)(xniii 7^0). 

Thus the complementary set filter ST' := V{W)\SC' can be viewed as the transversal hypergraph 
Tr{T-L'), i.e. as the family of transversals of the hypergraph H' = {Hi, ■ ■ ■ ,Hq}. Because each 
/c-element subset of W is in exactly one of SC and ST' we deduce that 




-Tk (1 < < w), 



2 



where 



Tk '■= number of /c-element transversals of the hypergraph at hand (here H'). 

The point is that ST' = Tr(T-L') is representable as a disjoint union of so called {0, 1, 2, e}-wa/uecE] 
rows ri to ry (Table 1) from which the numbers Tk are easily computable. 
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Table 1 



Namely, each rj comprises a bunch of 0, 1-strings x whose supports X are transversals of H'. 
Besides the don't care symbol 2, which can be freely chosen to be either or 1, we use the 
wildcard ee • • • 6 which means "at least one 1". In other words, only 00 • • • is forbidden. If 
several such e-bubbles appear within a row, they are distinguished by indices. Thus say x = 
(1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) is a member of ri, generated by 6161 = 11, 6262 = 10, 63636363 = 
1000, and 646464 = Oil. Correspondingly X = {1,3, 4, 5, 6, 13, 14} G Tr{n'). 

Here is the general (up to permutation of the entries) definition of a {0, 1, 2, e}-valued row: 

(1) r = (0, • -J , 0,1, ■ -J , 1,2, ■ -J , 2, 61, ■ - , 61 , • • • , , ■ - , et) 

a ^ J ei et 

Let zeros(r) be the position-set of the O's of r. Similarly define ones(r) and twos(r). Furthermore, 
let Pos{r,ei) be the position set of 6j---6j, thus \Pos{r,ei)\ = Sj. For instance, the row r2 in 
Table 1 has ones(r2) = {5, 9}, Pos(r2, 63) = {6,7,11,12}. The cardinality of a row r, i.e. the 
number of characteristic vectors contained in it, clearly is 

(2) |r| = 2T • (2^1 - 1) • • • (2^* - 1). 

Calculating the number Card(r, k) of fc-element members of r is more subtlell] but one verifies 
ad hoc that say Card(r6,8) = 1 and Card(r6,6) = 8. Generally 

R 

(3) Tfc = ^ Ca,rd{ri,k), 

1=1 

where R will always denote the number of final rows obtained by the 6-algorithm. In our example 
R = 7 and Card(rj, k) is the entry in the A:-th row and i-th. column of Table 2. (Here we ignore 

* Interchangeably we speak of multivalued rows (or just rows) akin to the more general setting of [W2]. 
^By [Wl, Thm.l] it costs 0{kw^ log^ w). Actually calculating all of Card(r, 1), Card(r, 2), . . . , Card(r-, k) costs 
0{kw^ log^ w). 
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Table 2 



As expected the r^,s sum up to \ST'\ = 8784 (and so do the column sums \ri\), and the f^s sum 
up to \SC'\ = 7600. Of course 8784 + 7600 = 2^'^. 

For each hypcrgraph 7i on [w] one has Tr{7i) ^ since [w] € Tr{7i), and using the e-algorithm 
to partition Tr(T-L) into R > many multivalued rows costs 0{Rh?w'^); sec the proof of Thm.3 
in [Wl]. Using this representation of Tr{H) to calculate all numbers fk as in the toy example 
amounts to a total cost of 0{Rh^w^ log^ w) since calculating Card(r, 1), • • • , Card(r, w) for any 
fixed row r costs 0{w^ log^ w). 

If one wants fk for only one particular k G [w] the method above may work well in practise but 
forbids a theoretic assessment since final rows r with Card(r, fc) = are useless and must be 
prevented. (If all fk are sought, no r is useless since always Card(r, k) for some k.). Here's 
what can be proven. 



Theorem 1 : Given are the facets Fi, - ■ ■ ,Fh Q [w] of a a simplicial complex SC and 
an integer fee [w]. Then calculating fk costs 0{R2'^hw'^k) = 0{fk2^hw'^k). 



Proof. Put ST = \ SC. According to [Wl, Thm.4(a)] the number of /c-element members 
of ST = Tr{H) costs 0{R2^hw%. This yields fk = {D - r^. □ 

As to the uglyness of the factor 2'*, see the remarks before Theorem 5. Notice that enumerating 
all A;-faces from the facets requires new ideas (Section 6). 
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4 Partitioning a simplicial complex, given its facets 



The face lattice of a simplicial complex SC based on is £ := SC U {W}, ordered by inclusion. 
For all X,Y e C the meet X AY is XnY; the join XVY is XUY iiXUY G SC, and W otherwise. 
The combinatorial face lattice enumeration problem is the following: Given the facets Fi, . . . , F^, 
calculate the diagram of C, which thus entails a listing of all covering pairs of faces. Letting 

N = \C\ (or N = \SC\) this can be achieved [KP, Thm.5] in time 0{N mm{h,w){\Fi\^ \-\Fh\) 

which is 0{Nh?w) or 0{Nhw^) depending on whether h > w or h < w. 

What about merely enumerating C and dispensing with its edges? The naive way (Theorem 3) 
takes time 0{N'^hw) but C being a closure system one can do it in time 0{NhnP') and space 
0{hw); as opposed to space 0{N m.m(h,w)) in [KP]. In fact, let C C P([w]) be any closure 
system which is given by its associated closure operator c : Viiw]) ^ V(\w]). If computing an 
arbitrary closure c{X) costs at most f{w) then enumerating C can be donqjin time 0{Nwf{w)). 
It is clear that in our situation f{w) = 0{hw). 

In this Section we present a method, very much different from [KP] or [GR], which delivers SC 
in a compact way, i.e. not one by one. 

We henceforth use the symbol tt) for disjoint union. For 1 < p < h let SCp Q SC be the 
simplicial complex generated by Fi, • • • , Fp, so SCp = V{Fi) U • • • U V{Fp). By induction assume 
that SCp = ri l±) r2 tt) • • • l±l Tm with {0, 1, 2, e}-valued rows rj. The basic procedure to extend this 
to a representation of 5Cp+i is as follows. We iteratively shrink ^"^+1(0] = V{Fpj^i) (viewed as 
{0, 2}-valued row) in order to make it disjoint from SCp, i.e. J-'p+i[0] 2 -T^+iil] 5 J-'p+i[2] I) • • • 
until Jp4.i[p] is such that 



Since J-'p-|_i[p] itself arises as disjoint union of multivalued rows, the induction hypothesis will 
carry over from SCp to 5Cp+i. It is clear that J^p+i[p] satisfies (4) if we set 



For the simplicial complex SC = SCq from Section 2 the procedure unfolds as follows (Table 3). 

*This follows at once from [GR] and earlier work of Ganter, albeit the 0(Nwf{w)) bound is not explicitly 
stated in [GR]. We mention that also closed sets "up to symmetry" and other generalizations are dealt with in 



(4) 



SCp+i = SCp U Tp+i[0] = SCp 1+) Tp+i[p]. 



(5) 



Jp+i[i] := {X G Tp+i[i - 1] : X^ V{Fi)} {1 < i < p). 



[GR]. 
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Table 3 



Starting with 

ri := ^i[0] = = (2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2) 

the only way for X G -72[0] = ■P(-F2) not to be a member of ri is to have X fl zeros(ri) = 
Xn{3,4,9} ^ 0. Hence 

r2 :=^2[1] = (2, 2, ei,ei, 0,2, 2,2,61,0,2,2, 2, 2). 

Similarly J^3[l] arises from J^3[0] by putting eicici on positions 3,4,9; and := -73[2] arises 
from 7^3(1] by putting 6262 on zeros(P(F2)) = {5, 10}. So far 

SC3 = V{Fi) U V{F2) U ^(^3) = n W r2 a ra, 

and we continue the same way up to J-'q [0] for which the subset J-g [5] can no longer be represented 
as a single {0, 1, 2, e}-valued row. Instead we note that the partition 

= {XeTe[4]:Xn {6, 7} 7^ 0} y {X e Te[4] : Xn{6,7} = 0} 

displays as follows in terms of multivalued rows: 
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The rows and are the candidate sons of r. Generally [W2, Sec. 4] there can be up to t 
(see (1)) candidate sons and they are always such that the pending constraint can be smoothly 
imposed upon them. Here re := r"*" C and r~ fl Fq\^] can be written as a multivalued row 

(namely T'j in Table 3). Thus J-6[5] = re W ry, and so SCq = ri ttl • • • I±I ry. The latter equality is 
also supported by the calculation 

kil + ■ ■ ■ + |r7| = 2048 + 3584 + 672 + 1260 + 9 + 24 + 3 = 7600 

where 7600 is the number obtained in Section 2. 
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In general T.p^ — 1] may be pi l+l p2 W • • • and upon enforcing each row's disjointness from V{Fi) 
each pj may in turn decay into a disjoint union of rows. The union of the latter rows yields J-'p[i]. 
The number K of disjoint rows that eventually make up SC can only be bounded by = \SC\ 
but in practice often R <^ N. 



Theorem 2: Given are the facets Fi ■ ■ ■ Fh C [w] of a simplicial complex SC. Mentioned 
method (based on the e-algorithm) to partition SC into R pieces costs 0{Rh?uP'). 



Proof: Since is an antichain (i.e. Fi ^ Fj for i ^ j) no P(Fp+i) = J^p+i[0] is 

contained in V{Fi) U • • • U V{Fp). Hence J-'p+i[p] 7^ in (4). Representing Tp+i[p] by a disjoint 
union of Rp+i > many {0, 1, 2, e}-valued rows entails imposing on J-'p+i[0] the p constraints 
X n //j / {Hi := W \ Fi for 1 < i < p), and this costs 0{Rp+ip'^w'^) as noted at the end of 
Section 3. In view of R = 1 + R2 + ■ ■ ■ Rh the claim now follows from 

J20{Rp+ip'^w^) = J20{Rp^w^) = 0{Rh^w^). 
p=i p=i 

□ 

Why settle for the face numbers fk (Section 3) if we can get the face numbers and SC in conve- 
niently compact form? Of course the answer is: The former requires processing h constraints, 
the latter 1 + 2 H \- {h - 1) = ^{h'^ - h) constraints. 



5 Partitioning a simplicial complex, given its minimal nonfaces 

Suppose SC is a simplicial complex whose minimal non-faces Gi, ■ ■ ■ ,Gq are known, i.e. the Gj's 
are the generators of the set filter SF = 'P{W) \ SC. Then SC consists of all noncovers X of 
{Gi, • • • , Gq} i.e. of ah sets X (^W such that 

(6) (VI < i < g) X 2 Gi. 
Because (6) is equivalent to 

(7) (VI < f < q) {W\X)nGi^ 0, 

one can emplo}j§ the transversal e-algorithm to enumerate all noncovers of {Gi, ■ ■ ■ ,Gg}, and 
whence enumerate SC; this costs 0{R(f'nP') where R is the number of final rows. Let us sum- 
marize. 

^In Section 3 we generated SJ- (which suffices for the face numbers of SC) by imposing h constraints {W \ 
Fi) n X ^ 0. Here in contrast we generate SC by imposing q constraints {W \ X) D d ^0. Both tasks can 
be achieved by the e-algorithm. However, as mentioned in previous publications, for tasks such as the latter it 
is more succinct to use a dual version of the e-algorithm, called n- algorithm. Its output is a disjoint union of 
{0, 1, 2, n}-valued rows where the wildcard nn ■ ■ - n means at least one here. 
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Theorem 3: Given are the minimal non-faces Gi ■ ■ ■ ,Gq [w] oi a simphcial complex 
SC. Mentioned method (based on the n-algorithm) to partition SC into R pieces costs 
0{Rq'^w'^). 



For example, for the simplicial complex SC from Section 2 one computes 74 minimal non- 
faces Gi- Applying the noncover n-algorithm to the Gj's displays SC as a disjoint union of 37 
{0, 1, 2, n}-valued rows. As another example, let SC be the simplicial complex of all independent 
sets of a matroid M on [w], given by its minimal non-faces (the circuits of M). Then SC can be 
partitioned with the noncover n-algorithm. In this particular case the facets of SC (the bases of 
M) are equicardinal. 

Unfortunately, if the minimal non-faces Gi arc unknown, it it tough finding them. Specifically, 
calculating the Gj's from the facets Fi amounts to the hard problem [EMG] of dualizing a 
monotone Boolean function. It may thus be faster to confront the ^{h^ — h) many constraints 
from Section 4 and get SC as disjoint union of {0, 1, 2, e}-valued rows rather than the {0, 1, 2, n}- 
valued rows of Theorem 3. Either h or q can be much larger than the other and usually one 
cannot tell in advance. 

Enumerating all A;-faces of SC from its minimal nonfaces is like Theorem 3 nothing really new 
(i.e. is akin to the proof of Theorem 1 and follows at once from [Wl, Thm.4(a')]). In the next 
session we strive to do the same when the facets are given. 



6 Enumerating the A;-faces one by one 

For practical purposes (such as Sections 7 and 8) the multivalued rows discussed in Sections 3 
to 5 are just fine. However, for theoretic deliberations we may wish to handle the faces one by 
one. Specifically, in Theorem 4 we assess the "naive way" to output in ordinary set notation (a) 
all the faces of SC from its facets, and (b) just the /^-faces. In Theorem 5 we redo (b) based on 
Section 4. For k € [w] recall that is the number of A;-faces. 



Theorem 4 (folklore): Let Fi, ■ ■ ■ , Fh C [w] he the facets of a simplicial complex SC, and let 
k G [w] be fixed. 

(a) Enumerating all N faces of SC can be done in time 0{N^hw). 

(b) Similarly, enumerating all fc-faces of SC can be done in time 0{f^hw). 



Proof, (a) Listing all members of V{Fi), followed by all members of V{F2), and so forth, yields 

a list of length < Nh and costs 0{Nhw). To prune the list of duplications wc compare each X 
in the list with the A'"' many distinct faces found so far. That costs 0{N'w) = 0{Nw). Hence 
the total cost is 

0{Nhw) + Nh ■ 0{Nw) = 0{N'^hw). 
(b) Of course, enumerating all A;-subsets of a set (e.g. in Gray code manner) is an easy matter. 
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Thus again processing V{Fi) up to V{Fh) in turn, compile a list of length at most f^h that 
contains all /c-faces of SC and then discard duplications. □ 



By Theorem 1 it costs 0{Rh'^w'^) to write SC as disjoint union of {0, 1, 2, e}-valued rows 
^1) '''2; ■ ■ ■ ; ^fi- Writing down all members of rj in set notation is a trivial matter which costs 

0{\ri\w). Hence the overall cost is 0{Rh?w'^) + 0{\ri\w) H h 0{\rR\w) = 0{Rh^w'^ + Nw). 

Albeit often R <^ N, unfortunately R can only be bounded by N, and so the last 0(- • •) becomes 
0{Nh'^w'^). This is (formally) worse than the 0{Nhw'^) algorithm of [GR] but other than [GR] 
our approach can be taylored to enumerate in output-linear time all faces of fixed cardinality. 

Thus for k € [w] put SC[k] := {X € SC : \X\ = k}. Unless k = w (Theorem 1), it is now possible 
that for some indices p the k-cutV{Fp^i, k) := V{FpJ^l)f^SC[k] is contained in V{Fi)\J - ■ ■\JV{Fp). 
For the sake of the upcoming proof we need to prevent this. The obvious way is to line up the 
power sets of the facets in arbitrary order and discard P(Fp+i) whenever P(Fp+i, k) is contained 
in the union of the previous power sets. (That e.g. takes place when V{Fp^i,k) = due to 
l-Fp+il < k.) On the face of it this test costs a whopping 0{w''~^^h'^) which is worse than the 
time 0{w^~^^h) it takes to produce all A;-sets X C [w] from scratch and check whether X € SCI 

Fortunately, testing the containment of V{Fpj^i,k) can alternatively be done in time 0{2Pw) 
which is bettei0 than 0{w''~^^p). Actually, by the philosophy of fixed parameter tractability 
[DW] we need not defend the factor 2^. Only the parameters k and fk matter. The former 
doesn't appear at all in Theorem 5(a) and the latter has come down from quadratic in Theorem 
4 to linear in Theorem 5(a). 

Testing whether V{Fp+i,k) is contained in V{Fi) U • • • U V{Fp) amounts to check (recall the 
proof of Theorem 1) whether some {0,2}-valued row r (namely 7-'(-Fp+i)) contains a /c-element 
transversal of some p-element hypergraph Ti. For later benefit we illustrate the more general 
setting of a {0, 1, 2, e}-valued row r as shown in Table 4. Say H = {Hi,H2} with Hi = 
{1, 7, 9}, H2 = {4, 5, 7}, and put A; = 4. For any Y G r define three properties ai, a2, as which Y 
may or may not possess. Namely oi holds if 1" fl -ffi / 0, 02 holds if y n / 0, and 03 holds if 
|y| = 4. If N{aia2a3) denotes the number of y E r that enjoy all properties, we need to decide 
whetheiH or not N{aia2a3) > 0. By inclusion-exclusion 

N{aia2a3) = \r\ - N{ai) - N{a2) - Nia^) + N{aia2) + iV(aia3) + A^(a2a3) - N(aia2a3) 

where e.g. A^(aia3) denotes the number of y G r that violate ai and 03. 
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Table 4 

'Switching from p < h to h, even for h = 10^°fc one has 2'7w'=+^ as ui — >■ 00. One may consider it a 
"contrived" situation when not just h > k but h > fk- It is an amusing exercise (trivial for k = 1) to show that 
h cannot be polynomially bound by fk- 

"One can often predict that N{a\a2 ■ ■ ■ am) > without evaluating 2™ terms; it e.g. follows (Boole-Bonferoni) 
from |r| - iV(ai) N{am) > 0. 



9 



The values 54,9,6,1 in Table 4 are obvious. Furthermore Nia-^) = \r\ — N{a3) = 54 — 
Card(r, 4) = 46 and similarly N{aia3) = N{ai) — N{aia3) = 9 — Card)(ri,4) = 5 as well 
as (020^3) = 4, N{aia2a3) = 0. It follows that 

N{aia2a3) = 54 - 9- 6- 46 + 1 + 5 + 4- = 3. 

We leave it to the reader to pinpoint these three sets. As previously mentioned, when r has 
length w computing Card(r, /c) costs 0{kw'^ log^ w) = 0{kw^). Therefore: 

(8) Deciding whether a multivalued row r is feasible in the sense of containing a /c-element 
transversal of some p-element hypergraph, costs 0{2^kw'^). 



Theorem 5: Let Fi, ■ ■ ■ , Fh Q [w] he the facets of a simplicial complex SC. 

(a) Then all A:-faces can be enumerated in time 0{fk2^h?w^). 

(b) Under the additional proviso that h < k the A;-faces can be enumerated in time 0{fkh^w'^). 



Proof. We take the proof of Theorem 1 as a template. For all 1 < p < /i we discard -Fp+i 
if V{Fp+i,k) is contained in V{Fi) U • • • U V{Fp). By (8) (applied to the {0,2}-valued row 
r = V{Fp^i)) that costs 

h-1 

(9) J2 0{2Pkw^) = 0{2^hkw^). 
p=i 

Keeping (9) in mind we may now assume that each Fp-|-i[0] = ^(-Fp+i) contains A;-faces not 
in V{Fi) U • • • U V{Fp). Hence Up+i := Tp+i[p] n SC[k] / for aU 1 < p < h. This yields 
SC[k] = C/i tt) C/2 tt) • • • tt) where Ui := 7'i[0] n SC[k]. Since Up+i = \Up+i\ is nonzer^ for 
1 < p < hwe may invoke [Wl, Thm. 4(a')] and conclude that the sets in C/p+i can be enumerated 
one by one in time 0{up-^.l2Pp'w^). Summing up gives 

h-1 h-1 h-1 

Y,0{up+i2Ppw') = ^0(/fc2W) = J2o{fk2'^hw') = 0{fk2'^h^w'). 
p=i p=i p=i 

Since this swallows 0{2^hkw^) from (9), (a) is shown. 

As to (b), when h < k the cost of 0{2^kw'^) in (8) plummets to 0{hw). The argument (up to 

h-1 

n o e duality) is given in the proof of [W2, Thm. 4]. Hence the sum in (9) becomes 

p=i 

0{h?w). According to [Wl, Thm. 4(b)] the cost of enumerating the sets in f/p+i plummets from 

h-1 

0{upj^i2^pw^) to 0{upj^ip'^w'^). Summing up yields ©(up+ip^w^) = 0{fkh^w^), and this 

p=i 

again swallows 0{h?w). □ 

**Since 0{up+\_2^pvf') = for Up+i — the condition itp+i > cannot be dropped in [Wl, Thm.4(a')]. We 
note that in the proof of the latter inclusion-exclusion (as above) is repeatedly apphed in order to have only 
feasible rows throughout the e-algorithm. 
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7 Application to combinatorial commutative algebra 



Following [MS] in Section 7 we adopt the notation A instead of SC. We shall touch upon the 
/i-polynomial, reduced homology groups, and the link of a face. 

7.1 As to the /i-polynomial 

w 
i=l 

of A, it is immediately computed from the face numbers fi of A and has many applications [S]. 
Here all fi are required and thejIHl can be calculated as in Section 3. 

7.2 The i-th reduced homology Hi{A;K) over the field K is defined as follows. Let Vj be a 
vector space which has all j-faces as a iT-basis. For certain K-lineai boundary maps 5i and 

of type 

one can show that im{6i^i) Q ker((5j). Hence one can define the quotient i^-vector space 

(10) mA;K) := ker(5,)/im(5i+i). 

Often only the dimension of Hi{A;K) is at stake. This being 

dim(ker((5j)) — dim(im((5j+i)) 

= dim{Vi) — dim(im((5i)) — dim(im((5j4.i)), 

it suffices to show how to calculate dim(im(5j)). The latter is the rank of a by fi matrix M 
whose columns are the "signed" supports (i.e. having alternating entries ±1) of the i-faces; see 
e.g. [MS, Example 1.18]. The i-faces can be delivered in output-linar time as shown in Section 
6. Calculating rank(M) is subtle and depends a lot on K, but for many researchers it boils 
down to a hardwired command in their favorite programming language. 

7.3 According to [MS, p. 17] the link of a face X of a simplicial complex A is 

(11) linkA(X) := {y G A : Y U X £ A and Y n X = 

i.e. the set of faces that are disjoint from X but whose union with X is small enough to stay 
in A. Of course A{X) := linkA(X) is a simplicial complex itself. Simplicial complexes of type 
linkA(X) occur in many situations, e.g. when defining Betti numbers, or in Reisner's criterion 
for a Stanley-Reisner ring to be Cohen-Macaulay. 

We close this Section by indicating how linkA(X) can be partitioned into multivalued rows. The 
quick answer is that the facets of link a (X) are the maximal members Gj of the set family 

{Fi\X: l<i<h, XCF^} 

^^In combinatorial commutative algebra an i-face is defined as having cardinality i + 1 (but dimension i). We 
stick to our definition of cardinality i. 
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and so applying the method of Section 4 to the sets Gj does the job. However, an existing 
partition of A can be exploited to get a partition of linkA(-^) faster. To do so define 



Disjoint(A,X) := {Y e A : Y n X = 
Minus(A, X) := {Z \ X : Z e A, X C Z}. 

Albeit a member of Minus(A, X) needs not be a face of SC, it is clear that 

(12) linkA(X) = Disjoint(A,X) n Minus(A,X). 

As will be seen, A = ri bJ • • • l±l readily spawns partitions 

Disjoint(A, X) = pi l±) • • • W pa, and Minus(A, X) = ai tb) • • • ttJ a^, 
and so it follows from (12) that 

(13) linkA(X) = naj: i € [a], j € 

It remains to write all nonempty pi Ci aj as a disjoint union of multivalued rows. 

To fix ideas, take the simplicial complex A = SC from Section 2 which is A = ri 1+) • • • ttJ 
according to Table 3. For the face X = {6, 7, 10, 11} one verifies that Disjoint(A, X) = ri l±l r2 tbi 
ra l±) r4 and Minus(A, X) = r[ l+) r4 l+) rg (see Table 5). Seven of the 4 • 3 = 12 intersections fj n r'j 
are empty, the other five happen to be expressible as single multivalued rows as shown in Table 
5. According to (13), linkA(X) is the disjoint union of these rows. 

In general the intersection of two multivalued row pi and aj is handled by imposing the e-bubbles 
(and I's) of one row upon the other. If merely \pi f] (Jj\ is required this can be obtained with 
inclusion-exclusion, similarly to Section 6. Namely, pick the row with the fewer e-bubbles. As 
opposed to Section 6 here the e-bubbles are disjoint, so they are few and inclusion-exclusion is 
fast. 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


~1 = 


2 


2 


(J 





2 








2 











2 


2 


2 


r2 = 


2 


2 


ei 


ei 











2 


ei 








2 


2 


2 


rs = 


2 


2 


ei 


ei 


1 








2 


ei 











2 


2 


f4 = 


2 


2 


ei 


ei 


1 











ei 








1 






































r[ = 


2 


2 








2 








2 











2 


2 


2 




2 


2 


ei 


ei 


2 











ei 








2 








< = 


2 


2 




















1 














1 
































f 1 n = 


2 


2 








2 








2 











2 


2 


2 


r2^r'^ = 


2 


2 


ei 


ei 














ei 








2 








r2 n rg = 


2 


2 




















1 














1 


^3 n r4 = 


2 


2 


ei 


ei 


1 











ei 

















r4 n r4 = 


2 


2 


ei 


ei 


1 











ei 








1 









Table 5 
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8 Application to frequent set mining 



A property vr defined on the subsets of X € W, is called monotone if with X each subset of 
X enjoys vr. Evidently the set of all X's that enjoy vr constitutes a simplicial complex. As to 
one particular monotone property to be dealt with, fix P C ^^{W) and an integer s > 1. Then 
X is called s'^-frequent (with respect to P) if X is a subset of at least s members T & D. For 
instance, let W = [4] and V := {{1, 2, 3}, {1, 2, 4}, {3, 4}}. Then the simplicial complex SC of 
ah 2+-frequent sets is SC = {0, {!}, {2}, {3}, {4}, {1, 2}}. 

In the framework oi frequent set mining one refers to T> as the database and to its elements T ^T) 
as transactions. For instance, the transactions could comprise the items bought by customers in 
a supermarket during a specific period of time. Hence an itemset X is s+-frequent if its items 
have been bought together at least s times. We write J^r(s+) for the simplicial complex of all 
s"'"-frequent sets. If J-r{s^) is small, its faces can be enumerated one by one, and in the early 
days of frequent set mining the so called A priori algorithm got famous for doing just that: it 
even made it to the Top Ten Algorithms in Data Mining [WK, ch.4]. For large J-"r(s^) it may 
still be desirable to have all of J^r(s~^) available, but then J^r(s~^) must be encoded in compact 
form, somehow. As seen in Sections 4 and 5, this can be done if either the facets or the minimal 
non-faces of J-'r(s^) are available. Fortunately the facets (alternatively: the minimal non-faces) 
can be calculated from D with the algorithm Dualize and Advance and its clever improvements. 
The facets alone are useful enough in some situations; e.g. they readily yield \Tr{s^)\ (Section 
2). 

In the remainder of this section we focus on the calculation of certain numbers attached to 
J^r{l^),J^r{2~^), and so on. Put V = {Ti, T2, • • • , Tm}. For simplicity assume that V is an 
antichain. Then the facets of SC := J'r(l+) are just Ti up to Tm- We are going to slice SC in 
two ways. First, cardinality- wise as 



(14) SC = 5C[0] W5C[1] W---W5C[7], 



where 7 is the maximum cardinality of a facet. Second, frequency- wise as 
(15) SC = Tr{l)\STr{2)\S---\STr{m), 

where Tr{s) := J-r{s^) \ Tr{{s + 1)+) is the family of all s-frequen^ sets in the sense that they 
occur in exactly s transactions. Furthermore, let 

fr{s,k) := \Tr{s)nSC[k]\ (0 < A; < 7, 1 < s < m) 

be the number of /c-element subsets of W which are s-frequent. These numbers can be computed 
as 



fr{s,k) = fr{s+,k)- fr{{s + l)+,k) 

where 

fr{s^,k) := number of s^-frequent sets of cardinality A;. 



**Be aware that in the frequent set mining literature usually "s-frequent" corresponds to our s^-frequent. 
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Dually to f{s'^,k) we define 

f{s,k^) := number of s-frequent sets of cardinality > k. 

Obviously 

fr{s,k~^) = fr{s,k) + fr{s,k + l) + --- + fr{s,w). 

Thus evertliing hinges on the numbers fr{s~^, k). They can be calculated as in Section 3 provided 
we have the facets of all simplicial complexes J^r{s~^) (1 < s < m). To obtain them Dualize and 
Advance can be employed, and one should also exploit the fact that the facets of J^r{s~^) help 
to find the facets of Tr{{s + 1)"*"). The matter needs further investigation. 

To fix ideas, consider the concrete database V = {Ti, ■ • ■ , Ty} given by: 





1 


2 


3 


4 


5 


6 


7 


8 


9 


Ti 
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X 
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T3 






X 




X 


X 




X 


X 






X 






X 




X 




X 








X 






X 




X 




n 




X 




X 






X 


X 


X 


TV 






X 






X 




X 


X 



Table 6 



One calculates these associated numbers /r(s. A;) for < s < 7 and 1 <k <9: 
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7 


8 
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2 


23 


69 


97 


76 


35 


9 


1 


312 


1 





13 


44 


53 


29 


8 


1 








148 


2 


2 


11 


12 


3 

















28 


3 


2 


4 


4 


1 

















11 


4 


2 


6 


1 




















9 


5 


2 


























2 


6 


1 


























1 


7 


































(?) 




(S 


(3) 







(?) 


(2) 


(S) 


511 



Table 7 



For instance, because of /r(3, 3) = 4 there are exactly four 3-frequent sets of cardinality 3, one 
of which {2,7,9} (indicated boldface in Table 6). Many more probabilities can be calculated 
from Table 5. Say, the probability that a random 3-element itemset is 2+-frequent, is 



i^±l±i» 0.202, 



© 

whereas the probability that a random 2+-frequent set has 3 elements, is 

12 + 4 + 1 



28 + 11 + 9 + 2 + 1 



0.333. 
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Similarly one reads from Table 7 that the probability of a 2"'"-frequent set X to be 2-frequent, is 
11 = 0.549. If one additionally requires that |X| > 2 or X > 3 the corresponding probabilities 
obviously increase, in fact they are 0.619 and 0.714. 
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